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Block-Serial Finite Field Multipliers 



Background of the Invention 

The present invention is generally directed to a circuit and method for multiplying 
elements of a finite field. More particularly, the present invention is directed to a process for 
multiplier design which provides a mechanism for trading off circuit complexity for circuit 
speed. Even more particularly, the present invention is directed to a mechanism which partitions 
one of the multipUcands into blocks. Multiplication of these blocks is easier and the size of the 
blocks is controllable as a design choice with smaller blocks having simpler circuits but requiring 
a larger number of operation cycles. The opposite is true for larger blocks. 

Finite fields have been used extensively in the construction of error correcting codes for 
many years. Recently, finite fields have also been applied to public-key cryptography using 
elliptic curves. A major difference in the practical applications of finite fields for error 
correcting codes and cryptography is that the size of the finite fields is significantly larger in 
cryptography than in error correcting codes. Accordingly, the implementation of finite field 
arithmetic for fields with large numbers has been of great interest lately. 

For finite fields of characteristic 2, addition is simply carried out by XOR (exclusive OR) 
operations. Multiplication is more involved. There are two general design approaches. In a 
bit-parallel design, the product terms are obtained in parallel by a set of AND operations 
followed by XOR operations and the operations may be carried out in one machine cycle in 
hardware as described in E. Mastrovito, "VLSI design for multiplication over finite fields 
GF(2^^)," Lecture Notes in Computer Science, vol. 357, pp. 297-309, Berlin: Springer-Verlag, 
March 1989. However, for a large finite field, it may take a considerable number of circuits to 
implement such a design. A bit-serial multiplier is based on the shift register design concept as 
described in W. W. Peterson and E. J. Weldon, Error-Correcting Codes, second edition, MIT 
Press, 1972, in which the components of the multiplier are processed sequentially one bit at a 
time to produce partial products. It takes k cycles to produce the final product if there are k 
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components in each of the field elements. The advantage is that the number of cu-cuits can be 
greatly reduced. 

Recently, a third approach to the design of finite field multipliers called hybrid 
multiplication has been presented as described in C. Paar, and P. Soria-Rodriguez, "Fast 
5 arithmetic architectures for public-key algorithms over Galois fields GF((2")"")," Advances in 
Cryptography-EUROCRYPT '97, W. Fumy, ed., pp. 363-378, 1997, and in C. Paar, 
P. Fleischmann, and P. Soria-Rodriguez, "Fast arithmetic for public-key algorithms in Galois 
fields with composite exponents," IEEE Transactions on Computers, vol. 48, pp. 1 025-1 034, 
October, 1999. The hybrid multiplication approach is only applicable if the finite field is 
10 composite so that it contains a proper subfield. A finite field of characteristic two is composite if 

the base two logarithm of the number of field elements is not a prime number. Consider the 
^ finite field GF(2'') with 2^ field elements. A field element is represented by a k component 

vector. If k is composite, say k = ran, then there is a natural way to represent the field elements 
] with m components with each component being an element of the subfield GF(2"). Hybrid 
1=1 multipliers that can be executed in m = k/n cycles for these composite fields have been presented 
■ as described in the articles by Paar et al. listed above. 

'■I For cryptographic applications, k is a large number, for example, a number greater than 

160 for elliptic curves. It is desirable to design a multiplier that can complete a multiplication 

J operation in less than k cycles and does not require a lot of circuits. Hybrid multiplication 

20 provides a solution. However, its application is limited to only special composite finite fields. In 
addition, cryptography based on composite finite fields is not preferred for security 
considerations. In particular, the values of k for the five binary finite fields recommended by the 
US government for digital signature standard published in FIPS PUB 186-2, January 27, 2000, 
are all primes. If k is a prime, there is no known algorithm that executes a multiplication in 

25 greater than one but less than k cycles. 

In this application, we present a block-serial method for constiiicting finite field 
multipliers for GF(2''), where k can be either prime or composite. The design is flexible and 
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provides a mechanism for trading off between speed and circuit complexity. One can now 
always construct a multiplier to execute a multiplication in any number of cycles between 2 and 
k/2. The present method is particularly applicable to cryptographic systems, especially for 
applications such as smart cards where circuit space is limited and performance is important. For 
5 composite values of k, the present design also offers circuit reduction particularly when 
compared to the use of hybrid multipliers based on subfields. 

Summary of the Invention 

In accordance with a preferred embodiment of the present invention, a finite field 
multiplier is constructed to multiply together two elements from the finite field GF(2^). The field 
1 0 elements are represented by binary polynomials a(x) and b(x) and multiplication is carried out 
% modulo an irreducible polynomial p(x) of degree k. The preferred circuit of the present invention 
.3 includes a first multiplier, a modulo 2 summer, a storage means, and a second multiplier. The 
'J fi^st and second multipliers are each much simpler than they would be in alternate designs. The 
;^ j first multiplier multiplies b(x) by Ay(x), where (T-1) > j > 0 and where A/x) is a polynomial 
%S based on a sequence of n coefficients from the polynomial for a(x) where k is composite and, in 
yi. fact, is equal to nT. Thus, each A/x) is a polynomial of degree n-1 with n coefficients. In fact, if 

^^^^ " 5 " yS ^z5) g^^^^ by 5j ^mi^'^ The output of the first 

muhiplier is supplied as the first of two inputs to a summer (readily implemented as a plurality of 
20 XOR gates). The output of the summer is stored for one of T cycles of operation in a storage 
means, such as a register. The output of the storage means is supplied to a second multiplier 
which multiplies the storage means output by x" and feeds its output to the summer, thus closing 
a feedback loop. 

Accordingly, it is an object of the present invention to provide flexibility in the design 
25 and construction of finite field element multipliers. 
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It is also an object of the present invention to provide multipliers which can operate faster 
than bit serial designs. 

It is yet another object of the present invention to provide multipliers which are less 
complex, in terms of circuits required than fully parallel designs. 

5 It is a still further object of the present invention to provide binary finite field multipliers 

even when the field size is not composite, that is, when the base 2 logarithm of the field size is a 
prime number. 

It is an object of the present invention to provide multiplier circuits which are useful in 
Qi cryptographic applications. 

W It is yet another object of the present invention to provide multiplier circuits which are 

useful in error correction applications. 

It is a still further object of the present invention to provide multiplier circuits for 
;;i polynomials wherein the multiplication is modulo an irreducible polynomial. 

J3 Lastly, but not limited hereto, it is an object of the present invention to provide multiplier 

1 5 designs which are operable in a wide ranging number of cycles. 

Description of the Drawings 

The subject matter which is regarded as the invention is particularly pointed out and 
distinctly claimed in the concluding portion of the specification. The invention, however, both as 
to organization and method of practice, together with the further objects and advantages thereof, 
20 may best be understood by reference to the following description taken in connection with the 
accompanying drawings in which: 
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Figure 1 is a block diagram of a circuit which implements bit parallel multiplication of 
two polynomial field elements, a(x) and b(x) modulo p(x) = + -f- 1 over GF(2); 

Figure 2 is a block diagram of a circuit which implements the same bit-serial 
multiplication which is shown in Figure 1 now being carried out in bit-parallel fashion; 

5 Figure 3 is a block diagram of a bit serial multiplier for polynomials a(x) and b(x) modulo 

p(x) = x^ + x + 1 over GF(2^); 

Figure 4 is a block diagram of a bit serial multiplier for polynomials a(x) and b(x) modulo 
p(x) over GF(2") which is a more general structure than that shown in Figure 3; 

Figure 5 is a flowchart indicating the structure of block serial multiplication, 

fp Figure 6 is a block diagram of a circuit for block-serial multiplication in accordance with 

"I the present invention; and 

Figure 7 is a block diagram of a block-serial multiplier of a(x) and b(x) modulo 
x^ +X+ 1. 

1 5 Detailed Description of the Invention 

For a proper understanding of the present invention, consider the field GF(q"'), where q is 
either 2 or a power of 2, An element of F = GFCq"") is represented as a polynomial over GF(q) of 
degree m-1 . Thus, a(x) - a^-i x"^"^ + . . . + ai x + ao, with coefficients a, in GF(q), is an element of 
F. The element can also be represented by the vector (am-i, . . . , ai , ao). 

20 The multiplication of two elements a(x) and b(x) in F is the product c(x) = a(x) b(x) 

modulo p(x), where p(x) is an irreducible polynomial of degree m over GF(q). For example, for 
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explanatory purposes, consider q = 2, m = 3,F = GF(2^), and p(x) = + x + 1. Let a(x) = (a2 x^ 
+ ai X + ao), b(x) = (b2 + bi x + bo), and c(x) = (c2 x^ + Ci x + co). Then 

c(x) = a(x)b{x) mod p{x) 

- a2b2x'^ + (aibi + ai62)x^ + {a2bo + ai6 1 + aob2)x^ 

+ {aibo + aob\)x + aobo mod 
= (a2bo + a\b\ +ao62 +^2&2)x^ 

+ (ai&o + ^o^i +a2bi-\'a\b2 + a2b2)x 

+ {aobo + a2bi +^162) 

Note that addition in the binary field GF(2) is the same as XOR. Figure 1 is a bit-parallel 
5 implementation of c(x). It requires 9 AND circuits and 8 2-way XOR circuits. It takes T = 1 
cycle to produce a product. 

Q A bit-serial multiplier is shown in Figure 2. Originally, the registers C2, Ci, and Co are 

: j clear. Then the components of a(x) are multiplied (AND operation) by the components of b(x) 
'11 sequentially fed into the registers one clock cycle at a time. The feedback connections at 

W the bottom of the diagram correspond to the last two terms of p(x) = x^ + x + 1 . At the end of 

three cycles, the registers contains the final product terms of a(x)b(x) mod p(x). This multiplier 
l:J has 3 AND circuits and 4 2-way XOR circuits. It requires T = 3 cycles to produce the product. 

^ Now consider GF(q"^) as another example where q = 2, m = 6, and p(x) = x^ + x +1 . 

Following a similar analysis from the previous example, a bit-parallel multiplier producing a 
1 5 product in one cycle requires 36 AND circuits and 35 XOR circuits. A bit-serial multiplier 

producing a product in T=6 cycles requires 6 AND circuits and 7 XOR circuits. 

Since 6 is a composite number, GF(2^) can be represented as F = GF(q^) - GF((2^)^) with 
m = 2 and q =^ 2\ In this case, F is a composite field containing the subfield GF(2^). The 
irreducible polynomial p(x) = x^ + x + 1 over GF(2^) may be used to define F. The field elements 
20 are represented as polynomials of degree 1 with coefficients in GF(q), where q = 2^ A hybrid 
multiplier (see the cited articles by Paar et al) based on the composite field is shown in Figure 3, 
Here, each of the parameters ai, b^, and c. is an element of GF(q) and is a 3 -bit vector. Each of the 
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registers Ci, andco is actually a 3-bit register. The multiplication of a, and bj in Figure 3 
represents the circuits shown in Figure 1 . The total number of AND circuits is 2 x 9 = 1 8. The 
number of 2-way XOR count is (2 x 8) + (3 x 3) = 25. It takes T = 2 cycles to produce a product. 
A block-serial multiplier, in accordance with the present invention, is presented below and is 
seen to require 18 AND circuits and 23 XOR circuits with T = 2. A comparison of performance 
and circuits is shown in the following table: 



Method 


Clock 


AND 


XOR 




Cycles 


Circuits 


Circuits 


Bit Parallel 


1 


36 


35 


Hybrid 


2 


18 


25 


Block serial 


2 


18 


23 


Bit Serial 


6 


6 


7 



A general hybrid multipUer for field elements in GF(q"') with q = 2" is shown in Figure 4. 

Attention is now specifically directed to block-serial multipliers. We do not consider 
whether a finite field contains an extension of the binary field GF(2) as a subfield. We represent 
elements of GF(2'') in k-bit binary vectors. To compute a(x)b(x) mod p(x), the present process 
divides a(x) into T blocks. The size n of each block is determined by the smallest of the integers 
greater than or equal to k divided by T. If k is not a multiple of T, the high-order block is padded 
with (nT - k) zeros at the high-order positions. The set of T blocks representing a(x) is 
sequentially multiplied by b(x) and stored in a register with feedback connections. It takes T 
clock cycles to produce a product. The cases of T = 1 and T = k reduce to bit-parallel and 
bit-serial finite field multiplication, respectively. 

Let a(x) = Ao(x) + Ai(x)x" + ... + At-i(x)x(^-'>", where the polynomials Ao(x), Ai(x), . . 

nT-l 

At-i(x) are of degree n- 1 . In general, if a(x) = S a,x', then it can be considered in T blocks 

T-l n-l r-1 «-l 

as 2 Zajn^pc-"^' = 2 Aj(x)x-'" where Aj{x) = 2 a^«+,x' where 0 </ < J- 1 . 
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The multiplication of a(x) and b(x) modulo p(x) is expressed as 

c{x) = a(x)b(x) mod p{x) 

= Ao{x)b{x) +^i(x)Z?(x)x« +^2(x)6(x)x2" + ... +v4r-i(x)6(x)x^^-^^" mod p{x) 
= ((...(^r-i(x)6(x)x'^ -l-^r-2(x)i(x))x'' + ... + J i(x)6(x))x'^ +A^{x)b{x) mod pix) 

The product is the sum of T terms and each term involves the multiplication of a degree n-1 
polynomial and a degree k polynomial. Basic hardware is provided herein to perform three 
5 functions: multiplication of A(x)b(x) mod p(x), where A(x) is a polynomial of degree n-1, 
addition of two k-bit polynomials, and multiplication of a degree k polynomial by x"" modulo 
p(x). The polynomials Ao(x), Ai(x), . . At-i(x) are fed into the basic hardware sequentially in T 
cycles to compute the final product c(x). A flow chart for the multiplication algorithm is shown 
in Figure 5 and a block diagram for hardware implementation is shown in Figure 6, where c(x) is 

IQ an accumulator with XOR circuits between registers to perform polynomial additions as 

^3 illustrated in the next example. 

Consider as a further example, the situation in which k = 6, T = 2, and p(x) = x^ + x + 1 . 
i; - We have n ^ k/T = 3. Polynomial a(x) is divided into two groups of 3 bits as a(x) ^ Ao(x) + 

Ai(x) x^ , where Ao(x) and Ai(x) are of degree 2. The muhiplication of a degree k polynomial 
f| b(x) by a degree n-1 polynomial is implemented in parallel. Let A(x) = ao + aix +a2xl We have 

"'"t d(x) = A{x)b{x) mod p(x) 

Q =A(x){bo +bix + bix^ + b^x^ + b^x"^ + bsx^) mod p{x) 

= dQ + d\x + dix^ + d^x^ + d^x^ + dsx^ 
= aQbo + a\b5'^a2b4 

+ (aobi + aibo + aibs + a2b4 a2b5)x 

+ iaob2 + a\b\ 4-a2Z>o + '^2&5)x^ 

-\- (aobs a\b2 + a2bo)X^ 

+ (aoZ?4 + ai63 + a2b2)x^ 

+ {a(^bs^-a\bA + a2b^)x^ 

Thus, A(x)b(x) mod p(x) can be implemented using 18 AND circuits and 14 2-way XOR circuits 
(note that the XOR of aibs and a2b4 is shared between do and di terms). The function c(x)x" mod 
p(x) is equal to 
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c{x)x^ mod p(x) 
= (co + Cix + C2X^ + c^x^ + C4x'^ + Csx^)x^ mod + x + 1 

= C3 + (C3 + C4)X + (C4 + C5)X^ + (c?0 + C5)x^ + CiX^ + C2X^ 

Thus, the multiplier in Figixre 6 becomes Figure 7 for this example. It requires 18 AND circuits 
and 14+ 9 = 23 two-way XOR circuits. As compared to the hybrid multiplier based on the 
subfield GF(2^), the multiplier in Figure 7 has 2 fewer XOR circuits. Consider an example 
5 of a larger finite field F ^ GF(2^0 that contains GF(2^) as a subfield. A hybrid multiplier based 
on the subfield with p(x) - x^ + x^ +1 requires 45 AND circuits and 58 XOR circuits. The block 
serial multiplier based on p(x) = x^^ + x + 1 requires 45 AND circuits and 39 XOR circuits. Both 
multipliers take 5 cycles to produce a product. The block-serial multiplier requires fewer XOR 
circuits than the hybrid multiplier. Since there are only two proper subfields, namely GF(2^) and 
ID GF(2^), aside from GF(2), a hybrid multiplier can only be designed to produce a product in 3 or 5 
^3 clock cycles. The block-serial multiplier design is more flexible. It can be designed to produce a 
- product in 2, 3, 4, 5, 6, 7 or 8 cycles. For example, to design a block serial multiplier that 
, produces a product every 2 clock cycles, the multiplier a(x) is divided into two blocks of size 8. 

That is, n =8 and a(x) - Ao(x) + Ai(x) x" , where both Ao(x) and Ai(x) are of degree 7. Since 
15 there are only 15 bits in a field element, the highest order term, i.e., the coefficient of x'^ term, of 
Ai(x) is set to zero. There is no hybrid multiplier that produces a product in 2 cycles. 

The new multiplication design can be applied to the finite field GF(2^) regardless of the 
value of k. 

The coefficients of c(x)x^ mod x^ + x +1 can be expressed as 



20 



0 


0 


0 


1 


0 


0 


Co 


0 


0 


0 


1 


1 


0 


Cl 


0 


0 


0 


0 


1 


1 


Cl 


1 


0 


0 


0 


0 


1 


C3 


0 


1 


0 


0 


0 


0 


C\ 


0 


0 


1 


0 


0 


0 . 


■ C5 



where the column vectors of the 6x6 matrix represents (x\ x", x^ x^ x'', x*) mod x* + x +1 . In the 
general case, c(x)x" mod p(x) can be expressed as the product of a matrix M and a column vector 
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containing the coefficients of c(x) as its components. The columns of the matrix M correspond 
to (x" mod p(x), x"""^^ mod p(x), ... , x"^^'^ mod p(x)). Matrix M can be mapped directly into XOR 
circuits for the logic block c(x)x" mod p(x) in Figure 6. 

Accordingly, it is seen that all of the objects stated above have been met in the system, 
5 circuits, and methods of the present invention. In particular, it is seen that finite field element 
multipliers can be built for any field of the form GF(2^) even if k is not a composite number. 
Furthermore, it is seen that the present technique of considering one of the multiplicands in block 
form permits circuits to operate over T = 1 cycles, T = k cycles, and various cycles in between, 
where k = nT. The blocks of one of the multiplicands is readily seen to be representable by a 

1 0 polynomial of degree n-1 with n independent coefficients. Since n < k, multiplier design is 
simplified. 

While the invention has been described in detail herein in accordance with certain 
Ij preferred embodiments thereof, many modifications and changes therein may be effected by 
' those skilled in the art. Accordingly, it is intended by the appended claims to cover all such 

11 modifications and changes as fall within the true spirit and scope of the invention. 
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